Overview

Dataset statistics

Number of variables13
Number of observations487
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.9 KiB
Average record size in memory90.3 B

Variable types

NUM10
BOOL2
CAT1

Reproduction

Analysis started2020-06-24 03:12:59.824444
Analysis finished2020-06-24 03:14:03.111559
Duration1 minute and 3.29 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Gender_Male is highly correlated with Gender_FemaleHigh correlation
Gender_Female is highly correlated with Gender_MaleHigh correlation
df_index has unique values Unique

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count487
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean253.3100616016427
Minimum0
Maximum499
Zeros1
Zeros (%)0.2%
Memory size3.8 KiB

Quantile statistics

Minimum0
5-th percentile25.6
Q1128.5
median256
Q3377.5
95-th percentile474.7
Maximum499
Range499
Interquartile range (IQR)249

Descriptive statistics

Standard deviation144.1515705
Coefficient of variation (CV)0.5690716332
Kurtosis-1.189132483
Mean253.3100616
Median Absolute Deviation (MAD)125
Skewness-0.0398491358
Sum123362
Variance20779.67527
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
49910.2%
 
19310.2%
 
16310.2%
 
16510.2%
 
16610.2%
 
16710.2%
 
16810.2%
 
16910.2%
 
17010.2%
 
17110.2%
 
Other values (477)47797.9%
 
ValueCountFrequency (%) 
010.2%
 
110.2%
 
210.2%
 
310.2%
 
410.2%
 
ValueCountFrequency (%) 
49910.2%
 
49810.2%
 
49710.2%
 
49610.2%
 
49510.2%
 

Age
Real number (ℝ≥0)

Distinct count70
Unique (%)14.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.70225872689939
Minimum4
Maximum85
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum4
5-th percentile18
Q132.5
median45
Q358
95-th percentile72
Maximum85
Range81
Interquartile range (IQR)25.5

Descriptive statistics

Standard deviation16.60357363
Coefficient of variation (CV)0.3714258319
Kurtosis-0.6927595706
Mean44.70225873
Median Absolute Deviation (MAD)13
Skewness-0.05308916481
Sum21770
Variance275.6786574
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60316.4%
 
48204.1%
 
45193.9%
 
50193.9%
 
38183.7%
 
42163.3%
 
65163.3%
 
55153.1%
 
33153.1%
 
75142.9%
 
Other values (60)30462.4%
 
ValueCountFrequency (%) 
420.4%
 
610.2%
 
720.4%
 
810.2%
 
1110.2%
 
ValueCountFrequency (%) 
8510.2%
 
8410.2%
 
7810.2%
 
75142.9%
 
7440.8%
 

Total_Bilirubin
Real number (ℝ≥0)

Distinct count87
Unique (%)17.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.6121149897330596
Minimum0.4
Maximum75.0
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum0.4
5-th percentile0.6
Q10.8
median0.9
Q32.15
95-th percentile10.09
Maximum75
Range74.6
Interquartile range (IQR)1.35

Descriptive statistics

Standard deviation5.173507721
Coefficient of variation (CV)1.98058192
Kurtosis83.7488455
Mean2.61211499
Median Absolute Deviation (MAD)0.3
Skewness7.389036994
Sum1272.1
Variance26.76518214
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.88016.4%
 
0.77114.6%
 
0.95110.5%
 
0.6408.2%
 
1234.7%
 
1.1173.5%
 
1.8112.3%
 
1.4112.3%
 
1.3112.3%
 
1.7102.1%
 
Other values (77)16233.3%
 
ValueCountFrequency (%) 
0.410.2%
 
0.540.8%
 
0.6408.2%
 
0.77114.6%
 
0.88016.4%
 
ValueCountFrequency (%) 
7510.2%
 
30.510.2%
 
27.210.2%
 
23.310.2%
 
23.210.2%
 

Direct_Bilirubin
Real number (ℝ≥0)

Distinct count61
Unique (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1207392197125257
Minimum0.1
Maximum14.2
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum0.1
5-th percentile0.1
Q10.2
median0.3
Q31
95-th percentile4.97
Maximum14.2
Range14.1
Interquartile range (IQR)0.8

Descriptive statistics

Standard deviation2.084303741
Coefficient of variation (CV)1.85975801
Kurtosis14.35144273
Mean1.12073922
Median Absolute Deviation (MAD)0.2
Skewness3.589204907
Sum545.8
Variance4.344322086
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.217636.1%
 
0.15411.1%
 
0.3428.6%
 
0.8193.9%
 
0.4183.7%
 
0.5173.5%
 
0.6153.1%
 
1.3122.5%
 
1122.5%
 
0.7112.3%
 
Other values (51)11122.8%
 
ValueCountFrequency (%) 
0.15411.1%
 
0.217636.1%
 
0.3428.6%
 
0.4183.7%
 
0.5173.5%
 
ValueCountFrequency (%) 
14.210.2%
 
12.810.2%
 
12.620.4%
 
11.810.2%
 
11.410.2%
 

Alkaline_Phosphotase
Real number (ℝ≥0)

Distinct count235
Unique (%)48.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean297.90143737166323
Minimum63
Maximum2110
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum63
5-th percentile140
Q1175
median205
Q3298
95-th percentile750
Maximum2110
Range2047
Interquartile range (IQR)123

Descriptive statistics

Standard deviation260.4012702
Coefficient of variation (CV)0.8741188782
Kurtosis15.4112214
Mean297.9014374
Median Absolute Deviation (MAD)47
Skewness3.566574105
Sum145078
Variance67808.82154
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
198102.1%
 
18291.8%
 
19091.8%
 
19591.8%
 
14581.6%
 
18081.6%
 
21581.6%
 
29881.6%
 
20271.4%
 
28271.4%
 
Other values (225)40483.0%
 
ValueCountFrequency (%) 
6310.2%
 
7510.2%
 
9010.2%
 
9220.4%
 
10020.4%
 
ValueCountFrequency (%) 
211010.2%
 
189610.2%
 
175010.2%
 
163010.2%
 
162010.2%
 

Alamine_Aminotransferase
Real number (ℝ≥0)

Distinct count141
Unique (%)29.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.63655030800821
Minimum10
Maximum2000
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum10
5-th percentile15
Q123
median33
Q359
95-th percentile231.4
Maximum2000
Range1990
Interquartile range (IQR)36

Descriptive statistics

Standard deviation193.4211628
Coefficient of variation (CV)2.369296132
Kurtosis47.30139426
Mean81.63655031
Median Absolute Deviation (MAD)13
Skewness6.412683064
Sum39757
Variance37411.74623
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25224.5%
 
20183.7%
 
18173.5%
 
28153.1%
 
30153.1%
 
22142.9%
 
21142.9%
 
15132.7%
 
36112.3%
 
24112.3%
 
Other values (131)33769.2%
 
ValueCountFrequency (%) 
1040.8%
 
1120.4%
 
1271.4%
 
1340.8%
 
1461.2%
 
ValueCountFrequency (%) 
200010.2%
 
168010.2%
 
163010.2%
 
135010.2%
 
125020.4%
 

Aspartate_Aminotransferase
Real number (ℝ≥0)

Distinct count154
Unique (%)31.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108.17043121149898
Minimum10
Maximum4929
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum10
5-th percentile15
Q124
median40
Q378
95-th percentile400.7
Maximum4929
Range4919
Interquartile range (IQR)54

Descriptive statistics

Standard deviation309.7203068
Coefficient of variation (CV)2.863262199
Kurtosis136.9981103
Mean108.1704312
Median Absolute Deviation (MAD)19
Skewness10.22427137
Sum52679
Variance95926.66842
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
23153.1%
 
20142.9%
 
21132.7%
 
30122.5%
 
25122.5%
 
22122.5%
 
29112.3%
 
28112.3%
 
19102.1%
 
40102.1%
 
Other values (144)36775.4%
 
ValueCountFrequency (%) 
1010.2%
 
1120.4%
 
1251.0%
 
1330.6%
 
1481.6%
 
ValueCountFrequency (%) 
492910.2%
 
294610.2%
 
160010.2%
 
150010.2%
 
105020.4%
 

Total_Protiens
Real number (ℝ≥0)

Distinct count57
Unique (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.459137577002054
Minimum2.7
Maximum9.6
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum2.7
5-th percentile4.6
Q15.7
median6.5
Q37.2
95-th percentile8.1
Maximum9.6
Range6.9
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation1.092959924
Coefficient of variation (CV)0.1692114327
Kurtosis0.2505812934
Mean6.459137577
Median Absolute Deviation (MAD)0.7
Skewness-0.3276127637
Sum3145.6
Variance1.194561395
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7275.5%
 
6255.1%
 
6.8244.9%
 
6.2214.3%
 
7.1183.7%
 
6.9183.7%
 
8173.5%
 
7.2173.5%
 
7.3163.3%
 
6.1163.3%
 
Other values (47)28859.1%
 
ValueCountFrequency (%) 
2.710.2%
 
2.810.2%
 
310.2%
 
3.620.4%
 
3.710.2%
 
ValueCountFrequency (%) 
9.610.2%
 
9.510.2%
 
8.910.2%
 
8.710.2%
 
8.620.4%
 

Albumin
Real number (ℝ≥0)

Distinct count39
Unique (%)8.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1778234086242296
Minimum0.9
Maximum5.5
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum0.9
5-th percentile1.8
Q12.6
median3.2
Q33.8
95-th percentile4.4
Maximum5.5
Range4.6
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation0.8010544656
Coefficient of variation (CV)0.2520764569
Kurtosis-0.3609524225
Mean3.177823409
Median Absolute Deviation (MAD)0.6
Skewness-0.1056334395
Sum1547.6
Variance0.6416882568
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3347.0%
 
4296.0%
 
2.9275.5%
 
3.1244.9%
 
3.9234.7%
 
3.2224.5%
 
3.5214.3%
 
2.5204.1%
 
3.3204.1%
 
3.7183.7%
 
Other values (29)24951.1%
 
ValueCountFrequency (%) 
0.920.4%
 
1.430.6%
 
1.530.6%
 
1.671.4%
 
1.720.4%
 
ValueCountFrequency (%) 
5.520.4%
 
510.2%
 
4.930.6%
 
4.820.4%
 
4.730.6%
 

Albumin_and_Globulin_Ratio
Real number (ℝ≥0)

Distinct count63
Unique (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9626899383983574
Minimum0.3
Maximum1.9
Zeros0
Zeros (%)0.0%
Memory size3.8 KiB

Quantile statistics

Minimum0.3
5-th percentile0.5
Q10.8
median1
Q31.1
95-th percentile1.5
Maximum1.9
Range1.6
Interquartile range (IQR)0.3

Descriptive statistics

Standard deviation0.2923777089
Coefficient of variation (CV)0.3037091147
Kurtosis0.2602027705
Mean0.9626899384
Median Absolute Deviation (MAD)0.2
Skewness0.4464170371
Sum468.83
Variance0.08548472465
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19519.5%
 
0.85411.1%
 
0.9479.7%
 
0.7418.4%
 
1.1387.8%
 
1.2316.4%
 
0.6275.5%
 
1.3255.1%
 
0.5193.9%
 
1.4173.5%
 
Other values (53)9319.1%
 
ValueCountFrequency (%) 
0.320.4%
 
0.3510.2%
 
0.471.4%
 
0.4510.2%
 
0.4720.4%
 
ValueCountFrequency (%) 
1.910.2%
 
1.8520.4%
 
1.830.6%
 
1.7210.2%
 
1.740.8%
 

Liver_Problem
Categorical

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.8 KiB
1
340
2
147
ValueCountFrequency (%) 
134069.8%
 
214730.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

Gender_Female
Boolean

HIGH CORRELATION

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size487.0 B
0
361
1
126
ValueCountFrequency (%) 
036174.1%
 
112625.9%
 

Gender_Male
Boolean

HIGH CORRELATION

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size487.0 B
1
361
0
126
ValueCountFrequency (%) 
136174.1%
 
012625.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexAgeTotal_BilirubinDirect_BilirubinAlkaline_PhosphotaseAlamine_AminotransferaseAspartate_AminotransferaseTotal_ProtiensAlbuminAlbumin_and_Globulin_RatioLiver_ProblemGender_FemaleGender_Male
00650.70.118716186.83.30.90110
116210.95.5699641007.53.20.74101
22627.34.149060687.03.30.89101
33581.00.418214206.83.41.00101
44723.92.019527597.32.40.40101
55461.80.720819147.64.41.30101
66260.90.215416127.03.51.00110
77290.90.320214116.73.61.10110
88170.90.320222197.44.11.20201
99550.70.229053586.83.41.00101

Last rows

df_indexAgeTotal_BilirubinDirect_BilirubinAlkaline_PhosphotaseAlamine_AminotransferaseAspartate_AminotransferaseTotal_ProtiensAlbuminAlbumin_and_Globulin_RatioLiver_ProblemGender_FemaleGender_Male
477490530.80.219396576.73.61.16110
478491271.00.3180561116.83.91.85201
479492351.00.38051331037.93.30.70110
480493650.70.226530285.21.80.52201
481494250.70.21851964016.53.91.50101
482495320.70.216531296.13.00.96201
483496241.00.218952318.04.81.50101
484497672.21.119842397.23.00.70101
485498681.80.515118226.54.01.60101
486499553.61.634940707.22.90.60101